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Abstract. The main object of this paper is to show how we can use classical prob- 
abilistic methods such as Maximum Entropy (ME), maximum likelihood (ML) and/or 
Bayesian (BAYES) approaches to do microscopic and macroscopic data fusion. Actu- 
ally ME can be used to assign a probability law to an unknown quantity when we have 
macroscopic data (expectations) on it. ML can be used to estimate the parameters of a 
probability law when we have microscopic data (direct observation) . BAYES can be used 
to update a prior probability law when we have microscopic data through the likelihood. 
When we have both microscopic and macroscopic data we can use first ME to assign a 
prior and then use BAYES to update it to the posterior law thus doing the desired data 
fusion. However, in practical data fusion applications, we may still need some engineering 
feeling to propose realistic data fusion solutions. Some simple examples in sensor data 
fusion and image reconstruction using different kind of data are presented to illustrate 
these ideas. 

key words: Data fusion, Maximum entropy, Maximum likelihood, Bayesian data fusion, 
EM algorithm. 

1. Introduction 

Data fusion is one of the active area of research in many applications such as non 
destructive testing (NDT), geophysical imaging, medical imaging, radio-astronomy, 
etc. Our main object in this paper is not to focus on any of these applications. 
Indeed, we want to show how we can use classical probabilistic methods such as 
Maximum Entropy (ME), maximum likelihood (ML) and/or Bayesian (BAYES) 
approaches to do data fusion. 

First, we consider these three methods separately, and we describe briefly each 
method. Then we will see some interrelations between them. 

We will see that ME can be used to assign a probability law to an unknown 
quantity X when we have macroscopic data (expectations) on it. ML can be used 
when we have assigned a parametric probability law, before getting the data, on 
X and we want to estimate this parameter from some microscopic data (samples 
of X). BAYES can be used to update probability laws, going from priors to 
posteriors. 

When we have both microscopic and macroscopic data we can use first ME to 
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assign a prior and then use BAYES to update it to the posterior law, doing thus 
the desired data fusion. In practical data fusion applications, however, we may 
still need some engineering feeling to propose realistic data fusion solutions. 



2. Short description of the methods 

2.1. Maximum Entropy (ME) 

ME can be used to assign a probability law to an unknown quantity when we 
have macroscopic data (expectations) on it. To see this let note by X a quantity 
of interest and try to see when and how we can use ME. We do this through a 
given problem. 

Problem PI: We have L sensors giving us L values {pi, 1 = 1,..., L}, representing 
the mean values of L known functions {ipi(X), I = 1, . . . , L} related to the unknown 
X: 



E{c)>i(X)}= / <fn(x)p(x)dx = m, i = i, 



L. (1) 

The question is then how to represent our partial knowledge of X by a probability 
law. 

Obviously, this problem has not a unique solution. Actually these data define 
a class of possible solutions and we need a criterion to select one of them. The 
ME principle can give us this criterion and the problem then becomes: 



maximize 



S(p) = — I p(x) \np(x) dx 



subject to 
The solution is given by 

p(x) 



cj>i{x)p{x) Ax = pi, l = l,...,L. 



Z(0) 



exp 



2=1 



Z(0) 



exp [-0*0( 3 



where 



Z{0) = J exp 



1=1 



dx 



(2) 



(3) 



is the partition function and . . . , 9{\ are determined by the following system 
of equations: 



d\nZ{0) 
09, 



pi, 1 = 1,.. .,L, 



(4) 



See 111 El for more discussions. 
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2.2. Maximum Likelihood (ML) 



Problem P2: Assume now that we have a parametric form of the probability law 
p(x; 9) and a sensor gives us N values x = [sci, . . . ,jejv] of X. How to determine 
the parameters 97 

Two classical methods for solving this problem are: 

— Moments Method (MM): The main idea is to write a set of equations (at least 
L) relating the theoretical and empirical moments, and solve them to obtain 
the solution: 



G 



1 N 

l (9) = E{X l }= x l p(x;9)dx = -J2x l r l = l,...,L (5) 

J 3=1 



Maximum Likelihood (ML): Here, the main idea is to consider the data as N 
samples of X. Then, writing the expression of p(x; 9) and considering it as a 
function of 9, the ML solution is defined as 



9 = argmax{/(0|a?)} with l{9\x) =p(x;0) = Y[p(xj;0) 
9 



(6) 



3 = 1 



It is interesting to note that, in the case of the generalized exponential families: 

L 



p(x;9) 



1 



we have 



Z{9) 



N 



exp 



i=l 



1 



Z{9) 



exp \—9 t (j>(c 



l(9) = Y[p( Xj -9) 

3 = 1 



Z N {9\ 



exp 



N L 
3=1 1=1 



(7) 



(8) 




Then, it is easy to see that the ML solution is the solution of the following system 
of equations: 



(9) 



Comparing equations (|j) & (0), we can remark an interesting relation between 
these two methods. See also || for more discussions. 

2.3. ML AND INCOMPLETE DATA: EM ALGORITHM 

Problem PS: Consider the problem P2, but now assume that the sensor gives us 
M values y = [yi, . . . , tjm] related to the N samples x — [x\, . . . , xn] of X by a 
non invertible relation, y = Ax with M < N. How to determine 9 ? 

The solution here is still based on the ML. The only difference is the way to 
calculate the solution. In fact we can write 



p(x;0) =p(x\y;9)p(y;0), VAx = y. 



(10) 
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Taking the expectation of both sides for a given value of 9 = thetab', we have 

]np(y; 9) = E x[y . g > {lnp(x; 9)} - E,.,^ {hxp(x\y; 9)} (11) 
or written differently: 

L{e) = Q{e-e')-v{e,e'). (12) 

Note that for a given 9' and for all 9 we have 

Lift) - Lie 1 ) = [QiO; 8') - Q(0'; 9')} + \V(B; 9') - Vp, 8% (13) 
Now, using the Jensen's inequality 

V(d;6') < Vie', 6') (14) 
an iterative algorithm, known as Expectation- Maximization (EM), is derived: 

E: Q(e;9 ik) ) = E flW {lnp(s;0)} 



M: 9 



(k+i) 



argmax < I Q[9;9 



(fc) 



(15) 



This algorithm insures to converge to a local maximum of the likelihood. 

It is interesting to see that in the case of the generalized exponential families 
(0), the algorithm becomes: 



N 



Step E: Qie-,0') = E x{y . , {lnp(x;9)} = -NhxZid) - ^ 0*E x]y . g , {4>{xj)} 



Step M: 



N 



dlnZid) 1 ~ 

3=1 



90, 



(16) 



Compare this last equation with those of (f|) and (||) to see still some relations 
between ME, ML and the EM algorithms. 

Problem P^: Consider now the same problem P3 where we want to estimate 
not only 9 but also x. We can still use the EM algorithm with the following 
modification: 



E: Q[9;9 



(fc) 



= E <^ lnpix;0)\y;O 
= V<x\y-9 y 



(fc) 



(17) 



M: 9 



(fc+i) 



argmax < Q I 6: 
9 



(fc) 
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2.4. Bayesian Approach 

Problem P5: Consider again problems PS or P4 but now assume that the obser- 
vations y are corrupted by noise: y = Ax + b. 

The main tool here is the Bayesian approach where, we use the data-unknown 
relation and the noise probability distribution to define the likelihood p(y\x; 9 X ) = 
p b (y — Ax; 9i) and combine it with the prior p(x; 9 2 ) through the Bayes' rule to 
obtain the posterior law 

p(xy; 9 1 ,9 2 ) = , (18) 

m(y;0i,0 2 ) 



where 

m 



{y,e 1 ,e 2 ) = j P {y\x-e l )p{x-e 2 )Ax (19) 

The posterior law p(x\y;9i,9 2 ) contains all the information available on x. We 
can then use it to make any inference on x. We can for example define the following 
point estimators: 

— Maximum a posteriori (MAP): 

x = argmax {p xly (x\y;0 1 ,0 2 )} (20) 

— Posterior Mean (PM): 

x = E x \ y {x} = J xp x \ y (x\y;9i,9 2 )dx (21) 

— Marginal Posterior Modes (MPM): 

x = argmax {p{x l \y; 0\, 9 2 )} , (22) 

where 

p{x t \y. 9i,9 2 ) = J p x \ y {x\y; 9) Ax x . . . dx^ x . . . Ax t+X . . . dx n (23) 

However, in practice, we face two great difficulties: 

— How to assign the probability laws p(y\x; 9{) and p(x; 9 2 )1 

— How to determine the parameters 9 = (9i,9 2 )l 

For the first we can use either the ME principle when possible, or any other 
invariance properties combined with some practical, scientific or engineering sense 
reasoning. For the second, there are more specific tools, all based on the joint 
posterior probability law 

p(x,9\y) (xp(y\x,9)p(x\9)p(9) oc p(x, y\9) p{9) oc p(x\y , 6) p(9) , 
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The following are some known schemes: 
• Joint Maximum a posteriori (JMAP): 



(d, x^j = arg max {p(x, 9\y)} 



(0,X) 




Generalized Maximum Likelihood (GML): 

x^ = argmax |p(a;|y; < - fc_1 ^)| 

0^ ^ = arg max (p(x^ \y, 0)p(0)\ 





Marginalized Maximum Likelihood (MML): 



= argmax-| j p(y\x) p(x; 0) Ax 




x = arg max 
x 



[p(x\y;0)} 



ML 



MAP 



x 



T 
y 

• MML-EM: 

An analytic expression for p(y; 0) is rarely possible. Consequently, considering 
[y, x] as the complete data and y as the incomplete data, we can use the EM 
algorithm to obtain the following scheme: 



i(fe) 



E:Q[0;0 j = % ly . d w {\np(x, y; 0)} 
M: = argmaxjQ f 0;0 



I 



y 



ML-EM 



{k) 9 



y 

i 



MAP 



G 



3. Data fusion 

In this section we consider some simple data fusion problems and analysis the way 
we can use the previous schemes to solve them. 
3.1. Sensors without noise 

Problem P6: The sensor CI gives N samples x a = {x\, . . . ,Xn} of X and stops. 
The sensor C2 gives M samples y b = {yi, . . . ,um} related to x by y = Ax + b. 
We are asked to predict the unobserved samples x b — {xn+i, ■ • ■ , ^jv+m } of X. 



X a X b 

CI : x\, . . . , xn ...?... 

C2 : ... J/i,...,J/m 

Va Vb 



Vb 



Fusion ? 



X b 
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We can propose the following solutions: 

— Use x a to estimate 0, the parameters of p(x; 0) and use it then to estimate 
Xb from y b : 



6 = argmax{L Q (0) = lnp(x a ; 9)} 


x b = argmax jp (x b \y b ; d'j j 



— Use both x a and y b to estimate x b : 



x n 



ML 



MAP 



T 

y b 



6 = arg max {L a (p(x a ;d)} 


(x b ,6) = arg max {p (x b , 0\x a , y b )} 
(x b ,0) 



Xq 



ML 



Vb 1 
e 

X„. ! 



JMARGML or ML-EM 



x b 




3.2. Fusion of homogeneous data 



x b 



Problem PI: We have two types of data on the same unknown x, both related to 
it through linear models: 




x — j H 1 H ?f>-»- y = Hix + bi 




x - H H 2 H S)-*- z = -ff 2 ^ + b 2 



For example, consider an X ray tomography problem where a; represents the mass 
density of the object and where y and z represent respectively a high resolution 
projection and a low resolution projection. 
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We can use directly the Bayesian approach to solve this problem: 



p(x\y,z) = 



p(y,z\x)p(x) 

p(y,z) 



Actually the main difficulty here is to assign p(y, z\x). If we assume that the 
errors associated to the two sets of data are independent then the calculation can 
be done more easily. For the purpose of illustration assume the following: 



p(y\x;af) oc exp 
p(z\x;a%) oc exp 
p(x;m, X) oc exp 



i[cc-m]*S 1 {x-m] 



Indeed, assume that the hyper-parameters (of, erf, m, E) are given. Then we can 
use, for example, the MAP estimate, given by: 



with 



x = argmax{p(a;|y, z)} = argrmn {J(x) = J\(x) + J 2 {x) + Js(x)} 



1 



■h{x) = —2\y-H lX \ , 
M x ) = -^s\ z - H 2 x\ 2 , 



J 3 (x) = ±[x -mJ'S- 1 



x — m\ 



However, in practical applications, the data come from different processes. 
3.3. Real data fusion problems 

Consider a more realistic data fusion problem, where we have two different 
kinds of data. As an example assume a tomographic image reconstruction problem 
where we have a set of data y obtained by an X ray and a set of data z obtained 
by an ultrasound probing system. The X ray data are related to the mass density 
x of the matter while the ultrasound data are related to the acoustic reflectivity r 
of the matter. Indeed, assume that, we have linear relations, both between y and 
x and between z and r. Then we have: 



x -*Q7T 

r 




■+)-»- y = H x x + 61 
<+>- z = H 2 r + b 2 
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Assuming that the two sets of data are independant, we can again use the Bayes 
rule which now becomes 

/ i \ Piv,z\x,r)p(x,r) p(y\x)p(z\r)p(x,r) 
p{x,r\y,z) = - 



p(y,z) p{y,z) 

with 

P{y,z) = J J p(y\x)p(z\r)p(x,r)drdx. 

Here also the main difficulty is the assignment of the probability laws p(y\x), 
p(z\r), and more specifically p(x, r). 

Actually if we could find a mathematical relation between r and x, then the 
problem would become the same as in the preceding case. To see this, assume that 
we can find a relation such as rj — g(xi+\ — Xi) with g a monotonic increasing 
function, from some physical reasons. For example, the fact that in the area where 
there are some important changes in the mass density of the matter both x and 
r change. Indeed, if g could be a linear function (an unrealistic hypothesis) then 
we would have 

V = H\X + 61 r 

z = II ,r ■ b. 



r = Gx 



z = GH 2 r + b 2 



For more realistic cases we need a method which does not use a physically 
based explicit expression of g. One approach proposed and used by Gautier et al. 
H' H' H ^ s based on a compound Markovian model where the body object o is 
assumed to be composed of three related quantities: 

o = {r,x} = {q,a,x} 

where q is a binary vector representing the positions of the discontinuities (edges) 
in the body, a a vector containing the reflectivity values such that 

% ■= > r 3 = 0: and r , = f g( X 3 + l - X i) if \ X 3 + 1 ~Xj\>a 

qj = 1 — ► rj = a j 3 [0 otherwise 

and g is any monotonic increasing function. 
With this model we can write 

p(o,r) =p(x,a,q) = p(x\a,q)p(a\q)p(q) 

and using the Bayes rule, we have 

p(x, a, q\y, z) oc p(y, z\x, a, q)p{x, a, q) = p(y, z\x, a, q) p{x\a, q) p(a\q) p(q) 

We illustrate this approach by making the following assumptions: 
— Conditional independence of y and z: p(y, z\x,a,q) = p(y\x)p(z\a) 



9 



Gaussian laws for bi and b 2 

p{y\ x '^i) « cx p 



1 

27{ ' 



—\y- H i x \ 



p(z\a; <j 2 ) oc cxp 



2a 9 2 



\z-H 2 a\ 2 



Bernoulli law for q: p(q) oc ^ q^(l — qi) 1 

i=i 

Gaussian law for a\q: 

1 



p(a\q) oc exp 



, Q = diag[gi, ...,q„] 



— Markovian model for x: p(x\a, q) oc exp [—U(x\a, q)] 
Then, based on 

p(x, a, q\y, z) oc p{y\x) p{z\a) p(x\a, q) p(a\q) p(q) 

we can propose the following schemes: 

— Simultaneous estimation of all the unknowns with the joint MAP estimation 
(JMAP): 



(x,a,q) = arg max {p(x,a,q\y, z)} 

(x,a,q) z 



First estimate the positions of the discontinuities q and then use them to 
estimate x and a : 




q = argmax{p(q|y,z)} 

y y 
(x,a) = arg max {p(x, a\y,z,q)} z 





y >-> 




Det. 


i — > q i — > 


Est. 









a; 
a 



— First estimate the positions of the discontinuities q using only z and then use 
them to estimate x and a : 



q = argmax{p(<?|2:)} 

(x,a) = arg max {p(x,a\y, z,q)} 



Det. 



y 

Q 

z 



Est. 



x 
a 



First estimate q and a using only z and then use them to estimate x: 



(q,a) = argmax{p(q,a|z)} 
x = argmax{p(a;|y, a, q)} 



Det. 

& 
Est. 



Q 

a 

y 



Est. 



x 
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First estimate only q using z, then estimate a using q and z, and finally, 
estimate x using q, a and y: 

q = argmax{p(q|2;)} 
a = argmax {p(a|z, q)} 
x = argmax {p(x\y, a,q)} 



Dct. 



Est. 



i — > a a i — > 

y 



Est. 



a; 



— First estimate only q using z and then estimate a? using q and the data y: 



q = argmax 

S = argmax {p(a; | y,q)} 



Two more realistic solutions are: 



Det. 



^ q Q< 

y 1 



Est. 



X 



Proposed method 1: 

Estimate r using only z and estimate x and q using r and y: 



Est. 



r 

y 



Reconstruction 



x 
Q 



p(r\z) oc p(z\r) p(r) p(x, q\r, y) oc p(y\x) p(x, q\r) 

For the first part, with the assumptions made, we have 

r = argmax{p(r|2;)} = argmin{Ji(r|2;)} 

with 



Mr\z) = \z - H 2 r\ 2 + A^> j+1 - rtf 



and for the second part we have 



(x,q) = argmax {p(x,q\y,r)} = arg min { J 2 (x, q\y,r )} 
(x.q) {x.q) 



with 



J 2 {x,q\y,r) = |y-ifia;| 2 + A^](l-^)0 J+1 -x j ) 2 +a 1 ^ q j {\-r j )+a 2 ^ q^j 



This last optimization is still too difficult to do due to the presence of q and x 
together. An easier solution is given below. 
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Proposed method 2: 

Use the ultrasound data z to detect the locations of some of the boundaries and 
use X ray data to make an intensity image preserving the positions of these dis- 
continuities: 



Est. 



y 



Est. 



x 



Here, we made slightly different assumptions about the distributions of r and x. 
Actually a generalized Gaussian distribution in place of Gaussian gives a good 
compromise of discontinuity preservation and easy implementation. A typical 
choice, for the first case is 



r = argmax {p(r \z)} = argmin { Ji(r\z)} 
with Ji(r\z) = ||^-iJ 2 T-|| 2 + Ai||r|| p , Kp<2 
and for the second case is 

x = argmax {p(x\y, q)} = argmin {J 2 (cc|y; q)} 

with J 2 (x\y,q) = \\y - H lX \\ 2 + A 2 ^(1 - qj)\x j+1 - xj\ p , l<p<2 



The aim of this paper is not to go through more details on these methods. The 
interested reader should refer to fl§, 01 . 



4. Conclusions 

To conclude briefly: 

— ME can be used when we want to assign a probability law p(x) from some 
expected values. 

— ML can be used when we have a parametric form of the probability law p(x, 0) 
and we have access to direct observations x of X, and we want to estimate 
the parameters 0. 

— ML-EM extends the ML to the case of incomplete observations. 

— When the observed data are noisy the Bayesian approach is the most appro- 
priate. 

— For practical data fusion problems the Bayesian approach seems to give all 
the necessary tools we need. 

— Compound Markov models are convenient models to represent signals and 
images in a Bayesian approach of data fusion. 

— The Bayesian approach is coherent and easy to understand. However, in real 
applications, we have still much to do to implement it: 

- Assignment or choice of the prior laws 
Efficient optimization of the obtained criteria 

- Estimation of the hyper-parameters 

- Interpretation of the obtained results. 
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